Skip to content

Conversation

@ScottDugas
Copy link
Collaborator

@ScottDugas ScottDugas commented Sep 9, 2025

This introduces a new KeySpacePath.importData that will import DataInKeySpacePath as gathered by KeySpacePath.exportAllData.

The new method works when importing data exported from other clusters.

Resolves: #3573
Resolves: #3751 -- I thought I was going to pull this out, but went back and resolved it with a mapPipelined cursor.

@ScottDugas ScottDugas added the enhancement New feature or request label Nov 9, 2025
@ScottDugas ScottDugas changed the title Keyspace import Introduce KeySpacePath.importData to import previously exported data Nov 9, 2025
@ScottDugas ScottDugas marked this pull request as ready for review November 10, 2025 16:08
}

// Store the data
byte[] keyBytes = keyTuple.pack();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add some fdb timer metrics for future use (imported_count)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think a timer around importFuture makes sense.


verifySingleKey(dataPath, Tuple.from("item"), Tuple.from("final_value"));
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional potential tests:

  • Large data (or any out of band error) during import
  • Import into partial path (no leaves in import data) + some remainders
  • import where data is of the wrong type

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yes, a test of more data than can be inserted into a single transaction would make sense, but not if I move it to ResolvedKeySpacePath and just have it take a single DataInKeySpacePath.
  2. I'm not sure what you mean by a partial path.
  3. If by data you mean the value, there is no validation, and it is not KeySpacePaths responsibility to know what is in the data. If you mean the object in the path, that should be validated above this call, and should be trust-worthy by the time you get a DataInKeySpacePath. Ideally this would be validated when you create the KeySpacePath, but it is covered in the serialization work, and I explain a bit more on the situation there: https://github.com/FoundationDB/fdb-record-layer/pull/3747/files#diff-15120b2e222e6bb7c2647b670f676b719cce8602e410487604bc87e9ea30a3b0R179

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the second bullet I meant importing into the middle of the path, as when you have a path defined for /company/employee/id/profile and the import only has /company/employee

In doing this, I had to rework the test for overwriting data, and
in doing so, I decided it would be better to have 3 tests.
Now all will both run by copying back to the same cluster, and
copying between clusters.
/** The amount of time checking if a {@link com.google.common.collect.RangeSet} is empty. */
RANGE_SET_IS_EMPTY("range set is empty"),
/** The amount of time importing a single KeyValue into a path. */
IMPORT_DATA("import KeyValue"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this being used?

Copy link
Collaborator

@alecgrieser alecgrieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only important thing is to make sure we set the executor in the fromIterator cursor, and the rest of these are just informational/minor

}

@ParameterizedTest
@EnumSource(CopyConfig.class)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the CopyConfig parameter like this is a little weird, but I can see how it's working. The alternative would, I think, be to introduce a new Extension that can select the source database and destination database for you, and then to set up the clusters appropriately in setUp so that the source matches the destination if and only if we're in CopyConfig.WithinCluster mode. I can buy that that's a bit too much machinery for this PR, if you wanted to separate that out (if you thought it was a good idea)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure I follow.
I wanted a handful of these tests to run both when copying within a cluster, and between clusters. A new extension would need to be smart enough to only apply to appropriate tests.
In my mind CopyConfig is just a slightly more readable version of @BooleanSource("withinCluster").

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing that's weird to me is that the class sets up the clusters to be different (if there's more than one around), and then the copy config overrides that. In my mind, it would be nice if setup set those two clusters consistently.

It is possible that there would be too much refactoring involved here to make that work in a way that only the relevant tests run in both modes.

ScottDugas and others added 2 commits December 1, 2025 15:24
Also, clean up some comment wording

Co-authored-by: Alec Grieser <[email protected]>
@ScottDugas ScottDugas merged commit 564d9dd into FoundationDB:main Dec 2, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve handling of importing with DirectoryLayerDirectory Add a KeySpacePath.import method to import the results of an export

3 participants